Day 17 - Regular expressions - Groups
56
$ echo "alpha10" | sed -r s,"([a-z]+([0-9]+))","Full code:\1 - Number:\2",
Full code:alpha10 - Number:10
you will realise that the inner group ([0-9]+) matches the digits 10 but the outer group matched
both the letters and the digits, basically acting as it was ([a-z]+[0-9]+), without the internal group.
When you use groups in the replacement string you are not forced to keep them in order, so for
example
$ echo "First,Second" | sed -r s/"(.*),(.*)"/"\2,\1"/
Second,First
is a very simple way to swap two fields in a comma-separated string. Well, maybe you don’t think
it is that simple, let’s review it together. First of all I used a / to separate the search and replacement
strings because the search string will contain a comma. The first group matches anything but up to
the first comma, after which the second group captures the rest. In the replacement string I print the
content of the second group, then a comma, and the content of the first group.
We can use groups in grep as well, even though the reuse of the matching values is obviously
limited by the fact that grep doesn’t replace text. The best use of groups in grep is with the so-
called lookaround expressions, that are provided by the Perl-compatible regular expression syntax
(PCRE), activated by the -P switch, as opposed to the -E that we used so far, that activates the
Extended syntax (ERE). The differences between these syntaxes are outside the scope of this book,
so feel free to investigate the matter online.
Lookaround expressions can provide information about the surroundings of a matching pattern
without including the surroundings themselves. Let’s look at an example: the simplest form of
lookaround is the positive lookahead, where a group specifies what should follow the matching
part of the string
$ cat examples.txt | grep -P "[A-Za-z ]+(?=[0-9]+)"
Police 101
H2O
R2-D2
Johnny 5
Cyborg 009
Here, the matching regular expression is [A-Za-z ]+, so strings of lowercase letters, uppercase letters,
and spaces. We are however interested only in those strings that are also followed by one or more
digits (?=[0-9]+). The effect is clear if you have a coloured output on your shell, or if you use the
-o option